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ABSTRACT 

In data management, and in particular in data integration, 
data exchange, query optimization, and data privacy, the no- 
tion of view plays a central role. In several contexts, such as 
data integration, data mashups, and data warehousing, the 
need arises of designing views starting from a set of known 
correspondences between queries over different schemas. In 
this paper we deal with the issue of automating such a de- 
sign process. We call this novel problem "view synthesis from 
schema mappings": given a set of schema mappings, each re- 
lating a query over a source schema to a query over a target 
schema, automatically synthesize for each source a view over 
the target schema in such a way that for each mapping, the 
query over the source is a rewriting of the query over the 
target wrt the synthesized views. We study view synthesis 
from schema mappings both in the relational setting, where 
queries and views are (unions of) conjunctive queries, and 
in the semistructured data setting, where queries and views 
are (two-way) regular path queries, as well as unions of con- 
junctions thereof. We provide techniques and complexity 
upper bounds for each of these cases. 

1. INTRODUCTION 

A view is essentially a (virtual or materialized) data set 
that is known to be the the result of executing a specific 
query over an underlying database. There are several data- 
management tasks where the notion of view plays an impor- 
tant role 21 . 



In database design, following the well-known principle 
of data independence, views may be used to provide a 
logical description of the storage schema (cf., 33 ). In 



this setting, since queries are expressed at the logical 
level, computing a query plan over the physical storage 
involves deciding how to use the views in the query- 
answering process. 



In query optimization 13 , the computation of the an- 



swer set to a query may take advantage of materialized 



views, because part of the data needed for the compu- 
tation may be already available in the view extensions. 

• In data privacy, authorization views associated with 
a user represent the data that such user is allowed to 
access [35]. When the system computes the result of 
a query posed to a specific user, only those answers 
deriving from the content of the corresponding autho- 
rization views are provided to the user. 

• In data integration, data warehousing, and data ex- 
change, a target schema represents the information 
model used to either accessing, or materializing the 
data residing in a set of sources [24[ |25| . In these 
contexts, views are used to provide a characterization 
of the semantics of the data sources in terms of the 
elements of the target schema, and answering target 
queries amounts to suitable accessing the views. 

The above discussion points out that techniques for using 
the available views when computing the answers to query are 
needed in a variety of data management scenarios. Query 
processing using views is defined as the problem of com- 
puting the answer to a query by relying on the knowledge 
about a set of views, where by "knowledge" we mean both 
view definitions and view extensions [22]. 

View-based query processing. Not surprisingly, the 
recent database literature witnesses a proliferation of meth- 
ods, algorithms and complexity characterizations for this 
problem. Two approaches have emerged, namely, query 
rewriting and query answering. In the former approach, the 
goal is to reformulate the query into an expression that refers 
to the views (or only to the views), and provides the answer 
to the query when evaluated over the view extension. In the 
latter approach, one aims at computing the so-called certain 
answers, i.e., the tuples satisfying the query in all databases 
consistent with the views. 

Query rewriting has been studied in relational databases 
for the case of conjunctive queries, and many of their vari- 
ants, both with and without integrity constraints (see a sur- 
vey in 22 ). A comprehensive framework for view-based 
query answering in relational databases, as well as several 
interesting complexity results for different query languages, 
are presented in [5] [l9] . 

View-based query processing has also been addressed in 
the context of semi-structured databases. In the case of 
graph-based models, the problem has been studied for the 
class of regular path queries and its extensions (see, for ex- 
ample, (9) |20]. In the case of XML-based model, results 
on both view-based query rewriting and view-based query 
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answering are reported in for several variants of the XPath 
query language (see, for example, [4| |12| . 

Where do the views come from? All the above works 
assume that the set of views to be used during query process- 
ing is available. Therefore, a natural question arises: where 
do these views come from? Some recent papers address this 
issue from different points of views. In [Ts], the authors 
introduce the so-called "view definition problem": given a 
database instance and a corresponding view instance, find 
the most succinct and accurate view definition, for a specific 
view definition language. Algorithms and complexity results 
are reported for several family of languages. (Note that the 
problem dealt with in [32] can be seen as a variant of the 
view definition problem ) 

In the context of both query optimization and data ware- 
housing, there has been a lot of interest in the so-called 
"view-selection problem" [14], that is the problem of choos- 
ing a set of views to materialize over a database schema, 
such that the cost of evaluating a set of workload queries 
is minimized and such that the views fit into a pre-specified 
storage constraint. Note that the input to an instance of this 
problem includes knowledge about both a set of queries that 
the selected views should support, and a set of constraints 
on space limits for the views. 

In data integration and exchange, the "mapping discovery 
problem" has received significant attention in the last years: 
find correspondences between a set of data sources and a tar- 
get (or, global) schema so that queries posed to the target 
can be answered by exploiting such mappings, and access- 
ing the sources accordingly. Several types of mappings have 
been investigated in the literature [25]. In particular, in the 
so-called LAV (Local-As- Views) approach, mappings asso- 
ciate to each source a view over the target schema. In other 
words, the LAV approach to data integration and exchange 
advocate the idea of modeling each source as a view. 

The problem of semi-automatically discovering mappings 
has been addressed both by the database and AI commu- 
nities [28[ |18| . In [so], a theoretical framework is presented 
for discovering relationships between two database instances 
over distinct schemata. In particular, the problem of un- 
derstanding the relationship between two instances is for- 
malized as that of obtaining a schema mapping so that a 
minimum repair of this mapping provides a perfect descrip- 
tion of the target instance. 



In 
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the iMAP system is 
described, which semi-automatically discovers both 1-1 and 
complex matches between different data schemata, where a 
match specifies semantic correspondences between elements 
of both schemas, and is therefore analogous to mappings. 
None of the above papers addresses the issue of automati- 
cally deriving LAV mappings. This implies that none of the 
methods described in those papers can be used directly to 
derive the view definitions associated with the data sources. 

Synthesizing views from schema mappings. In this 
paper, we tackle the problem of deriving view definitions 
from a different angle. We assume that we have as input a 
set of schema mappings, i.e., a set of correspondences be- 
tween a source schema and a target schema, where each 
correspondence relates a source query (i.e., a query over the 
sources) to a target query. The goal is to automatically syn- 
thesize one view for each source relation, in such a way that 
all schema mappings are captured. We use two interpre- 
tations of a "schema mapping captured by the synthesized 
views". Under the former interpretation, the schema map- 



ping is captured if the source query of such mapping is a 
nonempty, sound rewriting of the target query with respect 
to the views. Under the latter interpretation, the mapping is 
captured if the source query is an exact rewriting of the tar- 
get query with respect to the views. We remind the reader 
that, given a set of views V , a query g„ over the set of view 
symbols in V is called a sound (exact) rewriting of a tar- 
get query qt with respect to V if, for each target database 
that is coherent with the extensions of views V , the result of 
evaluating over the view extensions is a subset of (equal 
to) the result of evaluating qt over the target database. 

We call this problem (exact) view synthesis from schema 
mappings. We also refer to the decision problem associ- 
ated to view synthesis, called (exact) view existence: check 
whether there exists a set of views, one for each source, that 
captures all the schema mappings. 

The view-synthesis problem is relevant in several scenar- 
ios. We briefly discuss some of them. 

• In data warehousing, based on the consideration that 
business value can be returned as quickly as the flrst 
data marts can be created, the project often starts 
with the design of a few data marts, rather than with 
the design of the complete data warehouse schema. 
Designing a data mart involves deciding how data ex- 
tracted from the sources populate the data warehouse 
concepts that are relevant for that data mart [23]. In 
this context, view synthesis amounts to derive, from 
a set of specific data marts already defined, a set of 
LAV mappings from the data sources to the elements 
of the data warehouse. With such mappings at hand, 
the design of further data marts is greatly simplified: 
it is sufficient to characterize the content of the new 
data mart in terms of a query over the virtual ware- 
house, and the extraction program will be automat- 
ically derived by rewriting the query with respect to 
the synthesized views. 

• Similar to the case described above, real- world 
information-integration projects start by designing 
wrappers, i.e., processes that extract data from the 
sources and provide single services for the user. This 
is typically the scenario of portal design, where data 
integration is performed on a query-by-query basis. 
Each query is wrapped to a service, and each time this 
service is invoked through the portal, the extraction 
program is activated, and the specific data integration 
task associated to it is performed. A much more mod- 
ular, extensible, and reusable architecture is the one 
where a full-fledged data integration system, compris- 
ing the global (or, target) schema and the mapping to 
the sources, replaces this query-by-query architecture. 
View synthesis provides the technique to automatically 
derive such a data integration system. Indeed, if the 
various services are characterized in terms of queries 
over a target alphabet, the combination of wrappers 
and the corresponding queries over the target form a 
set of schema mappings, from which the view synthesis 
algorithm produces the LAV mappings the constitute 
the data integration system. 

• Recently, there has been some interest in so-called data 
mashups. A mashup is a web application that com- 
bines data or functionality from a collection of exter- 
nal sources, to create a new information service 1171. 



2 



Describing the semantics of such a service means to 
describe it as a query over a domain-specific alphabet. 
Once this has been done, the mashup is essentially 
characterized as a schema mapping from the exter- 
nal sources to a virtual global database. So, similarly 
to the above mentioned cases, view synthesis can be 
used to turn the set of mashups into a full-fledged data 
LAV data integration system, with all the advantages 
pointed out before. 

In all the above scenarios, view-synthesis is used for de- 
riving a set of LAV mappings starting from a set of available 
schema mappings. This is not surprising, since, as we said 
before, in the LAV approach sources are modeled as views. 
Nevertheless, one might wonder why deriving the LAV map- 
pings, and not using directly the original schema mappings 
for data warehousing, integration and mashup. There are 
several reasons why one is interested in LAV mappings: 

• LAV mappings allow one to exploit the algorithms 
and the techniques that have been developed for view- 
based query processing in the last years. 

• Several recent papers point out that the language of 
LAV mappings enjoys many desirable properties. For 
example, in 
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it is shown that LAV mappings al- 
ways admit universal solutions, allow the rewriting 
of unions of conjunctive queries over the target into 
unions of conjunctive queries over the sources, and are 
closed both under target homomorphism and union. 
Recently, LAV mappings have also been shown to be 
closed under composition, and to admit polynomial 
time recoverability checking 15]. 

• LAV mappings allow a characterization of the sources 
in terms of the element of the target schema, and, 
therefore, are crucial in all the scenarios where a pre- 
cise understanding, and a formal documentation of the 
content of the sources are needed. 

Contributions of the paper. In this paper we pro- 
pose a formal definition of the view-synthesis and the view- 
existence problems, and present the first study on such prob- 
lems, both in the context of the relational model, and in the 
context of semistructured data. 

For relational data, we address the case where queries and 
views are both conjunctive queries, and the case where they 
are unions of conjunctive queries. In the former case, we 
show that both view-existence and exact view-existence are 
in NP. In the latter case, we show that both problems are 

in ni 

In the context of semistructured data, we refer to a graph- 
based data model, as opposed to the popular XML-based 
model. The reason is that in many interesting scenarios, 
including the ones where XML data are used with refids, 
semistructured data form a graph rather than a tree. For 
graph-based semistructured data, we first study view syn- 
thesis and view existence in the cases where queries and 
views are regular path queries. We first present a techniques 
for view-existence based on automata on infinite trees [34| , 
and provide an ExpTime upper bound for the problem. We 
then illustrate an alternative technique based on the charac- 
terization of regular languages by means of left-right congru- 
ence classes. Such a characterization allows us to prove an 



ExpSpace upper bound for the exact view-existence prob- 
lem. Finally, by exploiting a language-theoretic characteri- 
zation for containment of regular path queries with inverse 
(called two-way regular path queries) provided in [To], we ex- 
tend the congruence class-based technique to the case where 
queries and views are two-way regular path queries, as well 
as conjunctive two-way regular path queries, and unions of 
such queries. 

Organization of the paper. The paper is organized as 
follows. In Section[2] we recall some preliminary notions. In 
Section |3] we formally define the problem of view-synthesis 
from schema mappings, and the problem of view-existence. 
In Section[4] we study the problem in the case where queries 
and views are conjunctive queries, and unions thereof. In 
Section |5] and Section |6j we illustrate the techniques for the 
view synthesis problem in the case of RPQs over semistruc- 
tured data. Section [7| extends the technique to two-way 
RPQs, and to (unions of) conjunctive two-way RPQs, re- 
spectively. Section [S] concludes the paper. 

2. PRELIMINARIES 

In this work we deal with two data models, the standard 
relational model [3], and the graph-based semistructured 
data model [§]. 

Given a (relational) alphabet S, a database D over E, 
and a query q over E, we denote with the set of tuples 
resulting from evaluating q inD. A query q over E is empty 
if for each database V over E we have q^ = 0. Given two 
queries qi and q2 over E, we say that gi is contained in q2, 
denoted gi C 52, if gF C gj' for every database T> over E. 
The queries qi and 52 are equivalent, denoted qi = q2, if 
both gi C 52 and q2 ^ qi- 

We assume familiarity with the relational model and with 
(unions of) conjunctive queries, (U)CQs, over a relational 
database. Below we recall the basic notions regarding the 
graph-based semistructured data model and regular path 
queries. 

A semistructured database is a finite graph whose nodes 
represent objects and whose edges are labeled by elements 
from an alphabet of binary relational symbols [6| |T| |10| . 
An edge (oi,r, 02) from object oi to object 02 labeled by 
r represents the fact that relation r holds between oi and 
02- A regular-path query (RPQ) over an alphabet E of bi- 
nary relation symbols is expressed as a regular expression 
or a nondeterministic finite word automaton (NWA) over 
E. When evaluated on a (semistructured) database "D over 
E, an RPQ q computes the set q^ of pairs of objects con- 
nected in D by a path in the regular language (g) defined 
by g. Containment between RPQs can be characterized in 
terms of containment between the corresponding regular lan- 
guages: given two RPQs gi and g2, we have that gi C g2 iff 
(qi) c (g2) [9]. 

We consider also two-way regular-path queries (2RPQs) [s] 
|10| , which extend RPQs with the inverse operator. For- 
mally, let E^ = E U {r~ I r G E} be the alphabet including 
a new symbol r~ for each r in E. Intuitively, r~ denotes the 
inverse of the binary relation r. If p G E^, then we use p~ 
to mean the inverse of p, i.e., if p is r, then p~ is p~ , and if 
p is r~, then p~ is r. 2RPQs are expressed by means of an 
NWA over E*. When evaluated on a database T? over E, a 
2RPQ g computes the set g^ of pairs of objects connected 
in D by a semipath that conforms to the regular language 
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(g). A semipath in T> from x to y (labeled with pi ■ ■ ■ Pn) 
is a sequence of the form ( j/o , Pi , 2/i , ■ • ■ , yn - 1 , Pn , yn ) , where 
n > 0, yo — X, yn — y, and for each yi-i,pi,yi, we have 
that Pi £ E^, and, if pi = r then {yi-i,yi) G r'^, and 
if Pi = r~ then £ r^^. We say that a semipath 

(yoiPi, • • ■ ,Pn,t/n) conforms to q 'd pi ■ ■ ■ p„ e (q). 

We will also consider conjunctions of 2RPQs and their 
unions, abbreviated (U)C2RPQs [t], which are (unions of) 
conjunctive queries constituted only by binary atoms whose 
predicate is a 2RPQ. Specifically, a C2RPQ q of arity n is 
written in the form 



q{xi 



^1, J/2n 



where xi, . . . , Xn,yi, ■ ■ ■ , y2m range over a set {zi, . . . , Zfc} 
of variables, {xi, . . . , C {j/i, . . . , y2m}, and each q^ is 
a 2RPQ. When evaluated over a database I? over E, the 
C2RPQ q computes the set of tuples (oi, . . . , o„) of objects 
such that there is a total mapping Lp from {zi, . . . , Zk} to 
the objects in D with '^{xi) — Oi, for i £ {l,...,n}, and 
(ip(2/2i-i), v(2/2j)) G qf, for j G {1, . . . , m}. 

Containment between 2RPQs and (U)C2RPQs can also 
be characterized in terms of containment between regular 
languages. We elaborate on this in Section [7| We conclude 
by observing that (U)CQs, RPQs, 2RPQs, and (U)C2RPQs 
are monotone. 

3. THE VIEW-SYNTHESIS PROBLEM 

The view-synthesis and the view-existence problems refer 
to a scenario with one source schema, one target schema, 
and a set of schema mappings between the two, where the 
goal is to synthesize one view for each source. 

To model the source and the target schemas we refer to 
two finite alphabets, the source alphabet Es and the target 
alphabet Et, and to model the queries used in both the map- 
pings and the views, we use three query languages, namely, 
the source language Qg over E^ U Et, the target language 
Qt over Et, and the view language Qv over Et. Notice that 
queries expressed in the language Qs may use symbols in 
the target alphabet. 

A schema mapping, or simply a mapping, between the 
source and the target is a statement of the form qs ~> qt, 
with qs G Qs and qt £ Qt- Intuitively, a mapping of this type 
specifies that all answers computed by executing the source 
query qs are answers to the target query qt. This means that 
?s G Qs is actually a rewriting of qt- This explains why we 
allow Qs to use symbols in the target alphabet: in general, 
the rewriting of a target query may use symbols both in the 



source alphabet, and in the target alphabet 26 



The problem we consider aims at defining one view for 
each source, in such a way that all input schema mappings 
are captured. The views V over Et to be synthesized are 
modeled as a (not necessarily total) function V : 'Es ~> Qv 
that associates to each source symbol a £ E^ a query 
V{a) £ Qv over the target alphabet Et. As we said before, 
our notion of "views capturing a set of mappings" relies on 
view-based query rewriting, whose definition we now recall. 
In the following, given a source database Ds, and a target 
database Vt, we say that V is coherent with T>s and Dt if 
for each element a in the source alphabet, the extension of 
this element in T>s is contained in the result of evaluating 
V{a) over the database T>t (where V{a) is the query that V 
associates to a). Formally, V is coherent with Ds and Dt if 
for each a £ E^, C V(o)'^'. 



Following [TTj, we say that a query qs £ Qs is a sound 
rewriting, or simply a rewriting, of a query qt £ Qt wrt 
views V, if for every source database Vs and for every target 
database Dt such that V is coherent with Vs and Ot, we have 
that C qf* . If gf = = qf* , the rewriting is said to be 
exact. Further, we say that qs is empty wrt V if for every 
source database T>s and for every target database ©t such 
that V is coherent with Vs and Vt, we have that qf' — 0. 
Notice that, if all views in V axe empty (i.e., for each a £ Eg, 
V{a) is the empty query), then trivially qs is empty wrt V. 
However qs may be empty wrt V even in the case in which 
all (or some) views are non-empty. 

We observe that, when Qs and Qv are monotonic query 
languages, the above definitions of sound and exact rewrit- 
ings are equivalent to the ones where the notion of "V being 
coherent with Vs and Vt" is replaced by the condition: for 

(see 
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It is easy to see that. 



each a £ Es, a ° = V{a) 
under this monotonic assumption, qs is a rewriting of qt wrt 
views V if qs [V] C qt , where here and in the following we use 
qs [V] to denote the query over Et obtained from qs by sub- 
stituting each source symbol a £ E^ with the query V{a). 
Further, qs is empty wrt V if qs[V] = 0, and it is an exact 
rewriting wrt V if qs[V] = qt. Note that in all the settings 
considered in the next sections, the languages Qs and Qv 
are monotonic. 

We are now ready to come back to the notion of "views 
capturing a set of mappings". We say that views V capture 
mappings AI if for each qs ^ qt £ M, the query qs is a 
rewriting of qt wrt V and is non-empty wrt V. Analogously, 
we say that views V exactly capture M if for each mapping 
qs qt (z M, the query qs is an exact rewriting of qt wrt V 
and is non-empty wrt V. 

We are now ready to introduce the (exact) view-synthesis 
and the (exact) view-existence problems formally. 

Definition 1. The (exact) view-synthesis problem is de- 
fined as follows: given a set M of mappings, find views V 
(exactly) capturing M. 

The (exact) view-existence problem is defined as follows: 
given a set M of mappings, decide whether there exist views 
V (exactly) capturing M. 

Finally, we also consider maximal views capturing map- 
pings M, which are views V such that there is no view V' 
capturing AI such that (i) V{a) C V'{a) for every a £ Es, 
and (ii) V{a) ^ V'{a) for some a £ Ej. 

4. VIEW SYNTHESIS FOR (U)CQs 

We start our investigations by tackling the case of view- 
synthesis and view-existence for conjunctive queries (CQs) 
and their unions (UCQs). 

We start with the case where Qs, Qt, and Q„ are CQs, 
and establish the following upper bounds. 

Theorem 2. In the case where Qs, Qt, and Qv are CQs, 
the view- existence and the exact view- existence problems are 
m NP. 

Proof. Consider a mapping qs qt, where qt contains 
£ atoms, and views V such that qs[V] C qt. Then, there 
exists a containment mapping from qt to qs [V] , and at most 
£ atoms of qs[V] will be in the image of this containment 
mapping. Hence, for each symbol a £ Es occurring in qs, 
only at most £ atoms in query V{a) are needed to satisfy 
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the containment mapping. In general, for a set M of map- 
pings, in order to satisfy all containment mappings from qt 
to </s[V^], for each qs qt € M, we need in the query V{a) 
at most £m = -^q^^M atoms, where is the number 
of atoms in qt- Hence, in order to synthesize the views V, 
it suffices to guess, for each symbol a € Es appearing in one 
of the mappings in M, a CQ V{a) over Et of size at most 
£m, and check that qs[V] C qt (and qt C qa[V] for the exact 
variant), for each qs qt £ M. This gives us immediately 
an NP upper bound for the (exact) view-existence problem. 
□ 

In the case where Qs and Qt are UCQs, we can general- 
ize the above argument by considering containment between 
UCQs instead of containment between CQs. 

Theorem 3. In the case where Qs and Qt are UCQs and 
Qv is CQs, the view- existence and the exact view- existence 
problems are m NP. 

Proof. Consider a mapping qs ~» qt and views V such 
that qs\V] C qt. We have that qa\V] C qt if for each CQ qi 
in the UCQ qa\V\ there is a CQ 52 in the UCQ qt such that 
qi O q2. For a set M of mappings, in order to satisfy all 
containment mappings from qt to g3[V^], for each qs qt £ 
M, we need in the query Via) at most Im — -^qteM 
atoms, where £q^ (this time) is the maximum number of 
atoms in each of the CQs in qt. Hence the upper bound on 
the number of atoms of V{a) is £m = X]q -^qt£M ■ Again, 
In order to synthesize the views V , it suffices to guess, for 
each symbol a G Es appearing in one of the mappings in 
M , a CQ V{a) over Et of size at most £m, and check that 
<ls[V] C qt (and qt C qs[V] for the exact variant), for each 
qs qt £ M. □ 

The last case we consider is the one where, in addition 
to Qa and Qt, also Qv is UCQs. As for view-existence, we 
observe that the problem admits a solutions for UCQs views 
iff it admits a solution for CQs views. 

Lemma 4. An instance of the view -existence problem ad- 
mits a solution m the case where Qa, Qt and Qv are UCQs 
and Qv tff it admits solution in the case where Qa and Qt 
are UCQs and Qv is CQs. 

Proof. Indeed, let V he a. set of UCQ views such that 
qa[V] C qt for each mapping gs ~> ?t £ Af. For such a 
mapping, qa[V] is a nonempty positive query. Consider the 
views V' obtained from V by choosing, for each symbol a in 
Es, as V'{a) one of the CQs in V{a). Then, each nonempty 
CQ in [V'] is contained in qa [V] , and hence in qt . It follows 
that also views V' provide a solution to the view-synthesis 
problem. □ 

Hence by the above lemma, we trivially get: 

Theorem 5. In the case where Qa, Qt, and Qv are 
UCQs, the view- existence problem is in NP. 

As for exact view-existence, allowing for views that are 
UCQs changes indeed the problem. 

Theorem 6. In the case where Qa, Qt, and Qv are 
UCQs, the exact view-existence problem is in U^. 



Proof. Let V hea set of UCQ views such that qa [V] = qt 
for each mapping qa qt G M. Let us first consider one 
such mapping qa qt, and let m^j be the number of CQs 
in qt , and £q^ the maximum number of atoms in each of the 
CQs in qt. Since qa[V] C qt, there is a containment map- 
ping from each of the m^j CQs in qt to some CQ in the 
UCQ q'a^vj where q'a^y is obtained from qa[V] by distribut- 
ing, for each atom a of qa, the unions in the UCQ ce[V] over 
the conjunctions of each CQ of qa- Hence, for each symbol 
a G Es occurring in qa, only at most m^j CQs of at most £q^ 
atoms in query V{a) are needed to satisfy the containment 
mappings. It follows that, to check the existence of UCQs 
views V and of such a containment mapping, it suffices to 
guess for each a a UCQ over Et consisting of at most rUq^ 
CQs, each with at most £q^ atoms. When considering all 
mappings qa qt £ M , similar to the case above, we have 
to use instead of ruq^ and Iq^ , the sum of these parameters 
over all mappings in M. To check whether these views sat- 
isfy qt C q's[l^], it suffices to check for the existence of a 
containment mapping from qa\V] to each of the CQs in qt, 
which can be done in NP in the size of qt . To check whether 
these views satisfy qa\V\ ^qt,^e have to check whether for 
each CQ q' obtained by selecting one of the CQs q" in qa 
and then substituting each atom a in q" with one of the CQs 
in cy.\V], there is a containment mapping from some CQ in 
qt to q' . We can do so by a coNP computation that makes 
use of an NP oracle to check for existence of a containment 
mapping. This gives us the IIj upper bound. □ 

5. TREE-BASED SOLUTION FOR RPQs 

We address now the view-synthesis problem when Qa, Qt, 
and Qv are RPQs, and present a techniques based on tree 
automata on infinite trees [sij. Specifically, we consider au- 
tomata running over complete labeled E-trees (i.e., trees in 
which the set of nodes is the set of all strings in E*). 

First, we observe that every language L over an alphabet 
E can be represented as a function L : E* — >■ {0, 1}, which, in 
turn, can be considered as a {0, l}-labeling of the complete 
E-tree. Consider a source alphabet Es = {ai, . . . ,a„} and 
the target alphabet Et. We can represent the views defined 
on Es by the {0, l}"-labeled Et-tree TV (i-e., a Et-tree in 
which each node is labeled with an n-tuple of elements of 
{0, 1}) in which the nodes representing the words in V{ai) 
are exactly those whose label has 1 in the i-th component. 
We call such trees view trees. Note that views defined by 
view trees assign an arbitrary languages on Et to each source 
relation; these languages need not, a priori, be regular. We 
return to this point later. 

Given a mapping m = qa qt, we construct now a tree 
automaton Am accepting all view trees representing views 
V capturing m. Concerning the check that qa is not empty 
wrt V, we observe that, if there is a word w = Ci ■ • ■ in 
(Aa) such that for all the letters a^^ , . . . , a^j appearing in w, 
there are nodes in the tree where the ij's component of the 
label is 1. The tree autmaton has to guess a set of letters 
in Es that cover a word accepted by Aa (we can ignore the 
letters in Et), and then check the above condition. 

We assume that qa is represented as an NWA Aa = 
(SsjEs U Et,Pa, 5s, -Fs) and qt is represented as an NWA 
At — {St,T,t,Pt , St, Ft)^ An annotation for a view tree Tv 

^Transition functions of NWAs can be extended to sets of 
states and words in a standard way. 
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is a ternary relation a ^ St x S^. An annotation a is cor- 
rect for Tv if the following holds: {p,p',ai) € a iff there is 
a word «; £ such that Tv{w)[i] = 1 and p' £ St{p,w). 
Intuitively, a describes the transitions that Tv can induce 
on At. 

We say that an annotation a captures qa ~> qt if for every 
word «) = ci • • • Cfc in (As) there is a sequence po, . . . ,Pk+i 
of states of At such that po = Pt, Pk+i G Ft, and, for i £ 
{0, . . . , k}, if d G Et then pi+i G 5t(pi, Ci), and if Ci = aj £ 
Ss, then (pi,pi+i,aj) G a. 

The significance of an annotation capturing a mapping 
comes from the following lemma. 

Lemma 7. V captures qs qt iff there is an annotation 
a that is correct for Tv and captures qs qt- 

We now characterize when a captures qs qt- 

Lemma 8. a does not capture qs qt iff there is a word 
w — ci ■ • ■ Cfc m (As) and a sequence Po,---,Pk+i of sets 
of states of At, such that Pq — {pt}, Pk+i n Ft = 0, and 
for i G {0, - - - ,k}, if Ci G St then Pi+i — 5t{Pi,Wi), and if 
Ci — aj G Ss, p G Pi, and {p,p',aj) G a, then p' G Pi+i- 

Thus, checking that a does not capture qs qt can be done 
by guessing the word w and the sequence Po, ■ • ■ , Pk+i of 
sets of states of At and checking the conditions. This can 
be done in space logarithmic in As and polynomial in At- 
It follows that we can check that an annotation a captures 
qs qt in time that is polynomial in As and exponential 
in At- 

We now describe a tree automaton A,n that accepts pre- 
cisely the view trees Tv, where V captures m — qs qt- By 
Lemma [T] all Am has to do is guess an annotation a that 
captures m and check that it is correct for Ty- 

Lemma 9. Given As and At, we can construct a tree au- 
tomaton Am that accepts all view trees that capture m — 
qs qt- The size of Am is exponential in the sizes of As 
and At - 

Proof. We construct Am = {Sm,'^m,Pm,3m, Fm) as a 
Biichi automaton on infinite trees |34|^ . Recall that Em = 
{0,1}". The state set is Sm = (^2^tX^^f x 2'^?. That 
is, each state is a triple consisting of a pair of annotations 
and a binary relation on St- The initial state set Sm con- 
sists of all triples /3 = {a, a, R^), where a captures m and 
R^{{p,p,a) \ p £ St}- Intuitively, an initial state is a guess 
of an annotation. The automaton Am now has to check its 
correctness; the second and third component of the state are 
used for "bookkeeping." 

Let Et = {bi,.-.,bk}- Then (ft, . . . , /3fc) G 5™(/3,c), 
where c — (ci,...,c„), /3 = (a^ , , a'^) , and /3j = 
(ofj, a^, Qf^) for j G {1, . . . , k}, if the following hold: 

1. If (pi,P2) G and ct — 1, then {pi,p2,ai) G - 

2. q] = a^; that is, the first component does not change. 

3. of = {(pi,P2) I (Pi,P2) e a^andpa € St{p2,bj)}; 
that is, the third component remembers paths between 
states of At- 

4. If (pi,p2, Oi) G a^, then either pi = p2 and Ci — 1, or, 
for some j G {1, . . . , m} and p'l G <5t(pi, bj), we have 
that (p'i,P2, Oi) G Qj. 



Thus, the second component of the state helps to check that 
all the paths in At predicted by the guessed annotation are 
fulfilled in the tree, while the third component helps to check 
that all the paths that do occur in the tree are predicted by 
the guessed annotation. This means that the second com- 
ponent must ultimately become empty. Note that once it 
becomes empty, it can stay empty. Thus the set Fm of ac- 
cepting states consists of all triples of the form (a, 0, P). 

Note that the number of states of Am is exponential in 
the number of states of At and exponential in the alphabet 
of As - The alphabet of Am is exponential in the size of the 
alphabet of A3. □ 

Theorem 10. In the case where Qs, Qt, and Q„ are 
RPQs, the view existence problem is ExpTime. 

Proof. We showed how to construct, with an exponen- 
tial blowup a Biichi tree automaton that accept all view 
trees that capture m = qs qt- Note that computing the 
set of initial states, requires applying Lemma [S] and takes 
exponential time. To handle a set M of mappings, we simply 
take the product of these automata (see product construc- 
tion in ?3? ) . To check that the views are nonempty, we take 
the product with a very simple automaton that checks that 
one of the labels of the tree is not identically 0. We thus ob- 
tain a Biichi tree automaton Am that accepts all view trees 
that represents nonempty views that capture AI. 

We can now check the nonemptiness of Am in quadratic 
time [34]. If (Am) ~ 0, then the answer to the view- 
existence problem is negative. If {Am) 7^ 0, then the 
nonemptiness algorithm returns a witness in the form of 
a transducer A — (S, Et, Em,po, 5, 7), where 5 is a set of 
states (which is a subset of the state set of the tree automa- 
ton), Et is the input alphabet, Em = {0,1}" is the output 
alphabet, po is a start state, 5 : Et — > 5* is a deterministic 
transition function, and 7 : S — >■ Em is the output function. 
From this transducer we can obtain an RPQ for each let- 
ter ai G Es, represented by the DWA A = {S,T,t,po,5, Fi), 
where Fi = {p \ p £ S and "/{p)[i] — 1}. □ 

Note that the proof of Theorem |10| implies that, wrt the 
view-existence problem, considering views that are RPQs (as 
opposed to general, possibly non-regular, path languages) is 
not a restriction, since the existence of general views implies 
the existence of regular ones. In fact, a similar result holds 
also for the exact view-existence problem, as follows from 
the results in the next section. This is also in line with a 
similar observation holding for the existence of rewritings of 
RPQs wrt RPQ views [9 . 

A final comment regarding maximal views. A view tree Tv 
is maximal with respect to a set M of mappings if V captures 
M, but flipping in one of the labels a single to 1 would 
destroy that property. Our tree-automata techniques can 
be extended to produce maximal views, by quantifying over 
all such flippings. This, however, would imply an additional 
exponential increase in the complexity of the algorithm. 

6. CONGRUENCE CLASS BASED SOLU- 
TION FOR RPQs 

We present now an alternative technique for view- 
synthesis for RPQs that will allow us also to extend our 
results to more expressive forms of queries. Our solution is 
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based on the characterization of regular languages by means 
of congruence classes |27|. 

We start by showing that we can reduce the (exact) view- 
synthesis problem with a set of mappings Af to an (exact) 
view-synthesis problem with a single mapping m. 

Theorem 11. Given a set M of RPQ mappings, there is 
a single RPQ mapping m such that, for every set V of RPQ 
views, V (exactly) captures MiffV (exactly) captures m. 

Proof. Let M = {go,s qo,t, ■ ■ ■ ,qh,s qh,t} be the 
set of mappings from Ej U St to Et, and let Ej = Et U {#}, 
where # is a fresh target symbol not occurring in E^ and 
Et. We define a mapping m — qm^s qM,t from E^ U Ej 
to T,'t, by setting qM,s = go,s-#-gi,s-# ■ • ■ #-<?h,s and qM,t = 
<lo,f#-qi,f# ■ ■ ■ il^-qh,t- Intuitively, the fresh symbol # acts 
as a separator between the different parts of qa/.s and qM,t- 
It is easy to verify that gi.sfl^] !^ qi,t, for i G {1, . . . , ft} iff 
qM,s[V] C qM,t- □ 

Hence, w.l.o.g., in the following we will consider only the 
case where there is a single mapping qs qt- 

Let At = {St,^t,Pt,5t,Ft) be an NWA for qt. Then At 
defines a set of (left-right) congruence classes partitioning 
Ej . Note that the standard treatment of congruence classes 
is done with deterministic automata [27], but we do it here 
with NWAs to avoid an exponential blow-up. For a word 
w G Et , we denote with [w]At the congruence class to which 
w belongs. Each congruence class is characterized by a bi- 
nary relation i? C StXSt, where the congruence class associ- 
ated with R is Cr = {w e E* I P2 G 5tipi,w,) iff (pi,p2) G 
R}. Intuitively, each word w G Cr connects pi to p2 in At, 
for each pair (pi,p2) G R. 

It follows immediately from the characterization of the 
congruence classes in terms of binary relations over the 
states of At that the set of congruence classes is closed un- 
der concatenation. Specifically, for two congruence classes 
Cfli and Cr2 , respectively with associated relations Ri 
and 7?2, the binary relation associated with Cr^ ■ Cr^ is 
Ri o i?2Q As a consequence, the set TZ of binary relations 
associated with the congruence classes is TZ = 2^*^^'. Let 
Re = {{P,P) I p G St} and R,, = {(pi,P2) | P2 G 5t{pi,b)}, 
for each fe G Et. Then, for each R £ TZ, the congru- 
ence class Cr associated with R is accepted by the deter- 
ministic word automaton Ar — (TZj'Et, Re, Sr^, {R}), where 

S^{R, b) — Ro Rt, for each R £TZ and 6 G Et. Notice that, 

2 

if At has m states, then the number of states of Ar is 2™ . 

Let us consider the (non-exact) view-synthesis problem. 
We observe first that we need to allow for the presence of 
empty queries for the views. Consider, e.g., qs = (ai + 03) ■ 
(a2 + 03) and gt = fei • &2. It is easy to see that the only 
views capturing qs qt axe 



V(ai) = 61 



V(a2) = 62, Vias) = 



Observe also that hi — [bi]At and 62 ~ [b2]Atj where At is 
the obvious NWA for 61 ■ 62 . 

We now prove two lemmas that will be used in the follow- 
ing. The first lemma states that w.l.o.g. we can restrict the 
attention to views capturing the mapping that are singleton 
views, i.e., views that are either empty or constituted by a 
single word. 

^We use Li ■ L2 to denote concatenation between languages, 
and Ri o R2 to denote composition of binary relations. 



Lemma 12. Let qa be an RPQ over Es U Et, and qt an 
RPQ over Et. If there exist RPQ views V capturing qa qt, 
then there exist views V' capturing qs qt such that each 
view in V' is either a single word over Et or empty. 

Proof. Since qs[V] ^ and qs[V] C qt, there exists a 
word ai ■ ■ ■ Ok G (qs) and a word wi---Wk G (53[y]) and 
hence in [At), where wj = (V(aj)). To define new views V' , 
we consider for each a G Es appearing in ai ■ ■ ■ a word 
If" G V{a) and set V'{a) = w". Instead, for each a G E^ 
not appearing in ai • • • Ofe, we set V'{a) — ^. Now, gs[l^'] is 
nonempty by construction, and since V (a) CI V{a) for every 
a G Es, we have that gs[V^'] E QsW] ^ qt- D 

The next lemma shows that one can close views under 
congruence. 

Lemma 13. Let qs be an RPQ over Es U Et, qt an RPQ 
over Et expressed through an NWA At, and V singleton 
views capturing qs qt- Then V' defined such that 



(Via)) = 



ifV{a) = 
ifV{a)=%. 



captures qs ■ 



qt- 



Proof. Consider a word a\ - - - Oh G {qs)- If there is one 
of the Oi such that V(ai) = 0, then (V'(ai)) • • • {V{ah)) = 
^ {it)- Otherwise, we have that {V{ai)) — {u;"'}, for i G 
{1, . . . , h}, and since • ■ • w"'' G (gs[V]) C {At), there is a 
sequence po , pi , . . . , Ph of states of At such that po = Pt , Ph £ 
Ft, and Pi GG St{pi-i,w°'^), for i G {l,---,h}. Consider 
now, for each i G {1, . . . , ft}, a word w'^ G {V'{ai)) = [ui"']At ■ 
Making use of the characterization of At in terms of a 
binary relation over St, we have for each word in [?j;''']At, 
and in particular for vui, that pt G 5t{pi-i,w'i). Hence, ph G 
St{pl,w[ - ■ -w'h) and G (qt)- □ 

From these two lemmas we get that, when searching for 
views capturing the mappings, we can restrict the attention 
to views that are congruence classes for At- 

Lemma 14. Let qs be an RPQ over Es U Et, and qt an 
RPQ over Et expressed through an NWA At - If there exist 
RPQ views V over Et capturing qs ~> qt, then there exist 
views V' capturing qs qt such that each view in V' is a 
congruence class for At . 

Proof. If there exist RPQ views V over Et capturing 
qs qt, then by Lemma [T2j w.l.o.g., we can assume that V 
are singleton views. Then, the claim follows from Lemma fTS] 
□ 

From the above lemma, we can immediately derive an 
ExpTime procedure for view existence, which gives us an 
alternative proof of Theorem [lO] We first observe that, for 
an NWA At with m states, each view defined by a congru- 
ence class Cr for At can be represented by the NWA Ar, 
2 

which has at most 2™ states. For a set V of views that 
are congruence classes, we can test whether qs[V] ^ and 
qs [V] C qt by 

• substituting each a-transition in the NWA As for qs 
with the NWA Ar^, where Cjj„ = (V(a)), thus ob- 
taining an NWA As,v', 



7 



• complementing At, obtaining an NWA At; and 

• checking the nonemptiness of As^v and the emptiness 
of As.v X At. 

Such a test can be done in time polynomial in the size of ^4^ 
and exponential in the size of At- 

Considering that the number of distinct congruence 

2 

classes is at most 2™ , the number of possible assignments 
of congruence classes to n view symbols occurring in Qs is at 

2 

most 2" ™ . For each such assignment defining views V, we 
need to test whether qs[V] ^ and qs[V] C qt- Hence the 
overall check for the view-existence problem requires time 
exponential in the size of At, polynomial in the size of ^4^ 
and exponential in the number of source symbols occurring 
in qs- 

The technique presented here based on congruence classes 
can be adapted to address also the exact view existence prob- 
lem. The difference wrt to (non-exact) view existence is that 
in this case we need to consider also views that are unions of 
congruence classes. Indeed, congruence classes (and hence 
rewritings) are not closed under union, as shown by the fol- 
lowing example. 

Let Qs = ai ■ a2 and qt = 00 + 01-1- 10. Then the following 
two sets of incomparable views maximally capture q^ qt'- 



Vi(ai) 
Vi(a2) 



V2(ai) 
V2{a2) 



0-f 1, 
0. 



Notice that views V, where V{ai) = Vi{ai) -\- V2(oi), for 
i G {1,2}, does not capture qs qt, since qs[V] includes 11. 

On the other hand, we can show that considering views 
that are unions of congruence classes is sufHcient to obtain 
maximal unfoldings. We first generalize Lemma [13] to non- 
singleton views. 

Lemma 15. Let q^ be an RPQ over Ss U Ef, qt an RPQ 
over Tit expressed through an NWA At, and V a set of views 
capturing qs "-^ qt- Then V' defined such that 



if Via) = ID- 



captures qs 



qt- 



Proof. Consider a word a\ - ■ - G [qs]- If there is one 
of the a, such that V{a^) = 0, then (V(ai)) ■ • ■ (V(ah)) = 
^ (9*)- Otherwise, we have that, for i G {1, - - - ,h} , for 



some w 



G {V(ai)), the word 



■ w 



{At)- We show that, for each i G {1,. 
have that • • • w'^^'^ ■ w' - ■ ■ ■ w° 



G MV]) c 

,h}, we also 
l;"'' G (At), for each 
^' G Uu,g(v(cii)) M^t ■ First, by definition of rewriting, if 
w''! . . . w"'' G \qslV]) C (At), then, for each w G {V{ai)), 
we also have that w°-^ ■ ■ ■ ■ ■ • ■ w"'' G (Qs[1^]) Q 

(At)- Then there is a sequence po,pi,---,ph of states 
of At such that po = p?, ph G Ft, Pj G St{pj-\,w°'^), 
for j G {I, . . . ,i—l,i-\-l, - - - ,h}, and pi G 5t(pi-i,w). 
Then, by the definition of congruence classes, for each word 
w' G [if] At, we have that pi G 5t{pi-i,w[), and hence 
m"! ■ • ■ w"--^ ■ w' ■ lo^'+i ■ • ■ w"'- G {At)- □ 



The above lemma implies that, when searching for views 
that maximally capture the mappings, we can restrict the 
attention to views that are unions of congruence classes. 



Lemma 16. Given a mapping m — qs qt, where qt is 
defined by an NWA At, every set of views V that maximally 
captures m is such that each view in V is a union of con- 
gruence classes for At - 

Proof. Consider a set of views V that maximally cap- 
tures m, and assume that for some a G Ss, V{a) is not 
a union of congruence classes for At- Then there is some 
word w G {V{a)) and som e w ord w' G [w]At such that 
w' ^ {V{a))- By Lemma 15 the set of views V' with 
{V'{a)) = {V{a))U {w'} also captures m, thus contradicting 
the maximality of V. □ 

We get the following upper bound for the exact view ex- 
istence problem. 

Theorem 17. In the case where Qs, Qt, and Q„ are 
RPQs, the exact view existence problem is in ExpSpace. 

Proof. By Lemma |16| we can nondeterministically 
choose views V that are unions of congruence classes and 
then test whether qt = qs[v] (we assume that gt ^ 0, other- 
wise the problem trivializes). To do so, we build an NWA 
As,v accepting (qs[l^]) as follows. We start by observing 
that for each union U of congruence classes, we can build 
the automaton Au = {TZ, pt,Rt,S^, U) accepting the words 
in U, which incidentally, is deterministic. Hence, by sub- 
stituting each a-transition in the NWA As for qs with the 
NWA Au,, where V{a) = Ua, we obtain an NWA As^v- 
Note that, even when As is deterministic As,v may be non- 
deterministic. 

To test qs [V] C g^, we complement At, obtaining the NWA 
At, and check the NWA As,v x At for emptiness. The size 
of As X At is polynomial in the size of .4s and exponential 
in the size of At- Checking for emptiness can be done in ex- 
ponential time, and considering the initial nondeterministic 
guess, we get a NExpTime upper bound. 

To test qt C gsfV], we complement As,v, obtaining the 
NWA As^v, and check At n As^v for emptiness. Since As^v 
is nondeterministic, complementation is exponential. How- 
ever, we observe that such a complementation can be done 
on the fly in ExpSpace, while checking for emptiness and in- 
tersecting with At- As a consequence, considering the initial 
nondeteministic guess, exact view existence can be decided 
in NExpSpace, which is equivalent to ExpSpace. □ 

7. EXTENSIONS 

In this section we sketch the extension of the results of 
the previous section to more expressive classes of queries: 
2RQPs, CRPQs, UCRPQs, and UC2RPQs. 

7.1 2RPQS 

Consider now the view-synthesis problem for the case 
where Qs, Qt, and Q„ are 2RPQs, expressed by means of 
NWAs over the alphabets E='= and A*. 

A key concept for 2RPQs is that of folding- Let it, u G E*. 
We say that v folds onto it, denoted v u, if v can be 
"folded" on u, e.g., abb~hc ^ abc. Formally, we say that 
V — vi ■ ■ ■ Dm folds onto It = Ml • ■ • it„ if there is a sequence 
io, - - - ,im of positive integers between and |it| such that 

• to = and im ~ n, and 

• for j G {0, . . . ,m}, either ij+i — ij + 1 and Hj+i = 
Ui.+i, or ij+i = jj - 1 and Vj+i = u~ . 
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Let L be a language over E*. We define fold (L) — {u : v 
u,v € L}. 

A language-theoretic characterization for containment of 
2RPQs was provided in [10| : 

Lemma 18. Let qi and q2 be 2RPQs. Then q\ C §2 iff 
iqi)Cfold{{q2)). 

Furthermore, it is shown in [lO] that if A is an n-state NWA 
over then there is a 2NWA for fold{{A)) with n- (|E±| + 
1) states. (We use 2NWA to refer to two-way automata on 
words.) 

In the view-existence problem , we are given queries qs and 
qt , expressed as NWAs As and At , and we are asked whether 
there exist nonempty 2RPQ views V such that qs[V] C qt 
and such that qs[V] ^ 0. We can use Lemma [Ts] for the 
tree-automata solution. A labeled tree V : (S*) — > {0, 1}" 
represents candidate views. To check that qs[V] % qt, we 
check that (A3[l/]) g fold{{At)). We now proceed as in Sec- 
tion [sj using the 2NWA for fold{{At)) instead of At. This 
requires first converting the 2NWA to an NWA with an ex- 
ponential blow-up 
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increasing the overall complexity to 

2ExpTime. ■ ■ 

We can also use Lemma [18] for the congruence-based so- 
lution. Here also a simplistic approach would be to convert 
the 2NWA for fold{{At)) into an NWA, with an exponen- 
tial blow-up, and proceed as in Section [6] To avoid this 
exponential blowup, we need an exponential bound on the 
number of congruence classes. For an NWA, we saw that 
each congruence class can be defined in terms of a binary 
relation over its set of states. It turns out that for a 2NWA 
A, a congruence class can be defined in terms of four binary 
relations over the set St of states of A: 

1. Rir'- a pair (pi,p2) G Rir means that there is a word 
w that leads A from pi to p2, where w is entered on 
the left and exited on the right. 

2. Rrf- a pair (pi,p2) £ Rh means that there is a word 
w that leads A from pi to p2, where w is entered on 
the right and exited on the left. 

3. Rii: a pair (pi,p2) £ Rii means that there is a word w 
that leads A from pi to p2, where w is entered on the 
left and exited on the left. 

4. Rrr'- a pair (pi,p2) G Rrr means that there is a word 
w that leads A from pi to p2, where w is entered on 
the right and exited on the right. 

Thus, the number of congruence classes when A has m states 
is 2 *" rather than 2"^ , which is still an exponential. This 
enables us to adapt the technique of Section [6] with essen- 
tially the same complexity bounds. 

Theorem 19. In the case where Qa, Qt, and Q„ are 
2RPQs, the view -existence problem is ExpTime and the ex- 
act view- existence problem is in ExpSpace. 

7.2 CRPQs 

Consider now the view-synthesis problem for the case 
where Qs and Qt are CRPQs, where the constituent RPQs 
are expressed by means of NWAs. Here the views have to 
be RPQs, rather than CRPQs, since CRPQs are not closed 
under substitutions. Thus, we can still represent views in 
terms of a labeled tree V : E* — >■ {0, 1}". The crux of our 



approach is to reduce containment of two CRPQs, gi and 52 
to containment of standard languages. This was done in 
Let qh, for /i = {1, 2}, be in the form 

qh{xi,. . . ,Xn) ^ qh,\(yh,i,yh,2) a ■ • ■ A 

qh,m^{yh,2mh-l,yh,2m^) 

and let Vi , V2 be the sets of variables of qi and q2 respec- 
tively. It is shown in [7 that the containment qi C q2 can 
be reduced to the containment (Ai) C {A2) of two word au- 
tomata ^1 and A2- Ai is an NWA, whose size is exponential 
in qi and it accepts certain words of the form 



lWld2$d3W2d4$ ■ 



i2mi—lWmi d2m 



where each di is a subset of Vi and the words Wi are over the 
alphabet of Ai. Such words constitute a linear representa- 
tion of certain semistructured databases that are canonical 
for gi in some sense. A2 is a 2NWA, whose size is an expo- 
nential in the size of q2, and it accepts words of the above 
form if the there is an appropriate mapping from q2 to the 
database represented by these words. The reduction of the 
containment q\ C 52 to {A\) C (A2) is shown in [t]. 

We can now adapt the tree-automata technique of Sec- 
tion [5] From qs and qt we can construct word automata As 
and At as in 7'. We now ask if we have nonempty RPQ 
views V such that (^^['l/]) C [At). This can be done as in 
Section [5] after converting At to an NWA. 

The ability to reduce containment of CRPQs to contain- 
ment of word automata means that we can also apply the 
congruence-class technique of Section |6] Suppose that we 
have nonempty RPQ views V such that (As[V^]) C {At). 
Then we can again assume that the views are closed with 
respect to the congruence classes of At- Thus, the techniques 
of Section |6] can be applied. 

Theorem 20. In the case where Qs, Qt, and Q„ are 
CRPQs, the view- existence problem is in 2ExpTime, and 
the exact view- existence problem is in 2ExpSpace. 

7.3 UC2RPQS 

Here we allow both C2RPQs and unions. Since UC2RPQs 
are not closed under substitutions, we consider here 2RPQ 
views. The extension to handle unions is straightforward. 
To handle C2RPQs, we need to combine the techniques 
of Sections |7.1| and |7.2| The key idea is the reduction of 
query containment to containment of word automata. The 
resulting upper bounds are identical to those we obtained 
for CRPQs. 

8. CONCLUSIONS 

In this paper we have addressed the issue of synthesizing 
a set of views starting from a collection of mappings relating 
a source schema to a target schema. 

We have argued that the problem is relevant in several 
scenarios, especially data warehousing, data integration and 
mashup, and data exchange. We have provided a formaliza- 
tion of the problem based on query rewriting, and we have 
presented techniques and complexity upper bounds for two 
cases, namely, relational data, and graph-based semistruc- 
tured data. We concentrated on the basic problems of view- 
existence, and we have shown that in both cases the prob- 
lem is decidable, with different complexity upper bounds 
depending on the types of query languages used in the map- 
pings and the views, and on the variant (sound or exact 
rewriting) of the problem. 
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We plan to continue investigating the view-synthesis prob- 
lem along different directions. First, we aim at deriving 
lower complexity bounds for the view-existence problem. 
Secondly, we are interested in studying view-synthesis for 
tree-based (e.g., XML) semistructured data. Finally, while 
in this paper we have based the notion of view-synthesis on 
query rewriting, it would be interesting to explore a variant 
of this notion, based on query answering using views. In 
this variant, views V capture a mapping of the form Qs qt 
if, for each source database D^, the query qs computes the 
certain answers to qt wrt V and Ds [11| . 
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